Missing data and the design of phylogenetic analyses
نویسنده
چکیده
Concerns about the deleterious effects of missing data may often determine which characters and taxa are included in phylogenetic analyses. For example, researchers may exclude taxa lacking data for some genes or exclude a gene lacking data in some taxa. Yet, there may be very little evidence to support these decisions. In this paper, I review the effects of missing data on phylogenetic analyses. Recent simulations suggest that highly incomplete taxa can be accurately placed in phylogenies, as long as many characters have been sampled overall. Furthermore, adding incomplete taxa can dramatically improve results in some cases by subdividing misleading long branches. Adding characters with missing data can also improve accuracy, although there is a risk of long-branch attraction in some cases. Consideration of how missing data does (or does not) affect phylogenetic analyses may allow researchers to design studies that can reconstruct large phylogenies quickly, economically, and accurately.
منابع مشابه
Missing data and the accuracy of Bayesian phylogenetics
The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in an analysis that lack data for some characters (incomplete taxa) or to include characters that lack data for some species. Given the difficulty of obtaining data from all characters...
متن کاملPhylogenetic relationships in Ranunculus species (Ranunculaceae) based on nrDNA ITS and cpDNA trnL-F sequences
The genus Ranunculus L., with a worldwide distribution, is the largest member of the Ranunculaceae. Here, nuclear ribosomal internal transcribed spacer (ITS) sequence data and chloroplast trnLF sequence data were used to analyze phylogenetic relationships among members of the annual and perennial (Group Praemorsa, Group Rhizomatosa, Group Grumosa and Group non-Grumosa) species of Ranunculus...
متن کاملA Bayesian Approach to Estimate Parameters of a Random Coefficient Transition Binary Logistic Model with Non-monotone Missing Pattern and some Sensitivity Analyses
A transition binary logistic model with random coefficients is proposed to model the unemployment statues of household members in two seasons of spring and summer. Data correspond to the labor force survey performed by Statistical Center of Iran in 2006. This model is introduced to take into account two kinds of correlation in the data one due to the longitudinal nature o...
متن کاملShould genes with missing data be excluded from phylogenetic analyses?
Phylogeneticists often design their studies to maximize the number of genes included but minimize the overall amount of missing data. However, few studies have addressed the costs and benefits of adding characters with missing data, especially for likelihood analyses of multiple loci. In this paper, we address this topic using two empirical data sets (in yeast and plants) with well-resolved phy...
متن کاملA taxonomic study of cyanobacteria in wheat fields adjacent to industrial areas in Yazd province (Iran)
Culturing, isolation, purification, and identification of cyanobacteria collected from wheat field soil, in five stations around the industrial areas in Yazd province (Iran) were conducted in this study. Identification of taxa was based on morphology and molecular methods. Cluster analysis and principal component analyses performed using SPSS software and rate of resemblance among the taxa were...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of biomedical informatics
دوره 39 1 شماره
صفحات -
تاریخ انتشار 2006